Abstract: BotometerLite is advertised as a lightweight bot detector that improves scalability by focusing on only user profile information; furthermore, BotometerLite claims that using fewer features only entails a small compromise in individual accuracy. We test the validity of this claim by comparing Botometer with BotometerLite bot likelihood scores for 75,000 users across 5 data sets. We randomly sampled 15,000 users from the following data sets: Coronavirus, 2016 election, News outlets, Charlottesville, and the Twitter API. BotometerLite scores varied drastically from Botometer scores.
Botometer is one of the most popular bot detection tools used in social science Rauchfleisch and Kaiser (2020). However, due to Botometer API rate limits, Beskow et al. (2018) recommends a tiered framework for bot detection and suggests models that focus only on user profile information can be used at scale for general estimates of bot penetration.
Yuan, Schuchard, and Crooks (2019) used DeBot for large-scale bot annotations when examining tweets related to the 2015 California Disneyland measles outbreak. Whereas, Broniatowski, Hilyard, and Dredze (2016) used Botometer for small-scale bot annotations.
Dunn et al. (2020) annotated bots based on Botometer scores of 0.5 of greater when assessing the limited role of bots in spreading vaccine-critical information. Botometer’s FAQ page explicitly states “It’s tempting to set some arbitrary threshold score and consider everything above that number a bot and everything below a human, but we do not recommend this approach. Binary classification of accounts using two classes is problematic because few accounts are completely automated”. Instead, Botometer recommends setting a threshold on the CAP score. Dunn et al. (2020) acknowledges the number of bots was likely overestimated in this study and unlikely to affect the study results.
Botometer was initially launched in May 2014 and BotometerLite was released in September 2020. BotometerLite improves scalability by focusing on only user profile information; furthermore, BotometerLite claims that using fewer features only entails a small compromise in individual accuracy Yang et al. (2020). The training and performance evaluation of BotometerLite is described in “Scalable and Generalizable Social Bot Detection through Data Selection” Yang et al. (2020).
Rauchfleisch and Kaiser (2020) found Botometer scores are imprecise at estimating bots, especially in a different language, and prone to variance over time a high number of human users as bots and vice versa.
Many researchers annotate bots based on Botometer score thresholds based on precedent established in previous literature (add citations). Understanding how BotometerLite performs in comparison to Botometer is critical to prevent people from thinking BotometerLite can be used as a scalable substitute for Botometer.
In this study, we seek to answer the following questions:
The Botometer FAQ section assigns bot scores based on the following categories:
Complete Automation Probability describes the probability, according to the Botometer model, that an account with this score or greater is at bot.
The following preliminary results explore the similarity between Botometer and BotometerLite scores for 10,000 users sampled from a 5G conspiracy theory tweet set.
I am currently collecting botscores from the GWU Tweet Sets library to be used in the final paper.
BotometerLite is most similar to the Botometer fake follower and spammer scores with \(R^2\) values of 0.394 and 0.334, respectively. Hence, if Botometer scores are accurate, BotometerLite may be somewhat effective at identifying some fake followers and spammers.
The pearson correlation matrix (\(R^2\) values are the square of the values of this matrix) also shows the scores are weakly correlated.
27% of the sample users have a Complete Automation Probability (CAP) of 0.75 or greater. Hence, if we apply a threshold of 0.75 to annotate bots in the data, roughly 1 out of 4 users in our sample would labeled as a bot.
Likewise, 53% of the sample users have a CAP of 0.5 or greater meaning over half of the bots would be labeled as bots. This seems highly unlikely.
Future work for course project:
Questions:
Beskow, David, Kathleen M Carley, Halil Bisgin, Ayaz Hyder, Chris Dancy, and Robert Thomson. 2018. “Introducing Bothunter: A Tiered Approach to Detection and Characterizing Automated Activity on Twitter.” In International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer.
Broniatowski, David A, Karen M Hilyard, and Mark Dredze. 2016. “Effective Vaccine Communication During the Disneyland Measles Outbreak.” Vaccine 34 (28). Elsevier: 3225–8.
Dunn, Adam G, Didi Surian, Jason Dalmazzo, Dana Rezazadegan, Maryke Steffens, Amalie Dyda, Julie Leask, Enrico Coiera, Aditi Dey, and Kenneth D Mandl. 2020. “Limited Role of Bots in Spreading Vaccine-Critical Information Among Active Twitter Users in the United States: 2017–2019.” American Journal of Public Health 110 (S3). American Public Health Association: S319–S325.
Rauchfleisch, Adrian, and Jonas Kaiser. 2020. “The False Positive Problem of Automatic Bot Detection in Social Science Research.” Berkman Klein Center Research Publication, nos. 2020-3.
Yang, Kai-Cheng, Onur Varol, Pik-Mai Hui, and Filippo Menczer. 2020. “Scalable and Generalizable Social Bot Detection Through Data Selection.” In Proceedings of the Aaai Conference on Artificial Intelligence, 34:1096–1103. 01.
Yuan, Xiaoyi, Ross J Schuchard, and Andrew T Crooks. 2019. “Examining Emergent Communities and Social Bots Within the Polarized Online Vaccination Debate in Twitter.” Social Media+ Society 5 (3). SAGE Publications Sage UK: London, England: 2056305119865465.